Who Wrote this Novel? Authorship Attribution across Three Languages
نویسنده
چکیده
Based on different writing style definitions, various authorship attribution schemes have been proposed to identify the real author of a given text or text excerpt. In this article we analyze the relative performance of word types or lemmas assigned to represent styles and texts. As a second objective we compare two authorship attribution approaches, one based on principal component analysis (PCA), and a new authorship attribution method involving specific vocabulary (Z score classification scheme). As a third goal we carry out our experiments on data from three corpora written in three different languages (English, French, and German). In the first we categorize 52 text excerpts (taken from 19th century English novels) written by nine authors. In the second we work with 44 segments taken from French novels (mainly 19th century) written by eleven authors. In the third we extract 59 German text excerpts written by 15 authors and covering the 19th and early 20th centuries. Based on these collections and two specific features (word types or lemmas) we demonstrate that the Z score method performs better than the PCA, while demonstrating that lemmas tend to produce slightly better performance than word types.
منابع مشابه
Authorship Attribution Via Combination of Evidence
Authorship attribution is a process of determining who wrote a particular document. We have found that different systems work well for particular sets of authors but not others. In this paper, we propose three authorship attribution systems, based on different ways of combining existing methodologies. All systems show better effectiveness than the state-of-art methods.
متن کاملMixture of Experts Authorship Attribution Notebook for PAN at CLEF 2012
For problems A, B, C, D, I, and J we used three Authorship Attribution techniques; a distance based nearest neighbor, a svm, and method that used a distanced based NN approach to classify sections of a document and classifying based on who wrote majority of the document. These three techniques were then considered experts and each given a vote to determine the author of each document. For probl...
متن کاملProving and Improving Authorship Attribution Technologies
Who wrote Primary Colors? Can a computer help us make that call? Despite a century of research, statistical and computational methods for authorship attribution are neither reliable, well-regarded, widely-used, or wellunderstood. This paper presents a survey of the current state-of-the-art as well as a framework for uniform and unified development of a tool to apply the state-of-the-art, despit...
متن کاملWho Wrote This Code? Identifying the Authors of Program Binaries
Program authorship attribution—identifying a programmer based on stylistic characteristics of code—has practical implications for detecting software theft, digital forensics, and malware analysis. Authorship attribution is challenging in these domains where usually only binary code is available; existing source code-based approaches to attribution have left unclear whether and to what extent pr...
متن کاملAuthorship Attribution: A Comparative Study of Three Text Corpora and Three Languages
The first objective of this paper is carry out three experiments intended to evaluate authorship attribution methods based on three test-collections available in three different languages (English, French, and German). In the first we represent and categorize 52 text excerpts written by nine authors and taken from 19th century English novels. In the second we work with 44 segments from French n...
متن کامل